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ABSTRACT 

An automatic Bayesian Kepler periodogram has been developed for identifying and 
characterizing multiple planetary orbits in precision radial velocity data. The peri- 
odogram is powered by a parallel tempering MCMC algorithm which is capable of 
efficiently exploring a multi-planet model parameter space. The periodogram employs 
an alternative method for converting the time of an observation to true anomaly that 
enables it to handle much larger data sets without a significant increase in compu- 
tation time. Improvements in the periodogram and further tests using data from HD 
208487 have resulted in the detection of a second planet with a period of 909^^g2d, an 
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eccentricity of 0.37° o^20' * semi-major axis of 1.87^q'\4 AU and an Msinz = 0.45^q ^ 
Mj. The revised parameters of the first planet are period = 129.8 ± 0.4d, eccentricity 
= 0.20 ±0.09, semi-major axis = 0.51 ±0.02 AU and M sini = 0.41 ±0.05 Mj. Partic- 
ular attention is paid to several methods for calculating the model marginal likelihood 
which is used to compare the probabilities of models with different numbers of planets. 

Key words: Extrasolar planets, Bayesian methods, model selection, time series anal- 
ysis, periodogram, HD 208487. 



1 INTRODUCTION 

The discovery of multiple planets orbiting the Pulsar 
PSR B1257-I-12 (Wolszczan & Frail, 1992), ushered in 
an exciting new era of astronomy. Fifteen years later, 
over 200 extra-solar planets have been discovered by 
a variety of techniques, including precision radial ve- 
locity measurements which have detected the major- 
ity of planets to date (Extrasolar Planets Encyclopedia, 
|http:/ /vo. obspm.fr/exoplanetes/encyclo/index.php). It is to 
be expected that continued monitoring and increased pre- 
cision will permit the detection of lower amplitude plane- 
tary signatures. The increase in parameters needed to model 
multiple planetary systems is motivating efforts to improve 
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Much of the recent work has highlighted a Bayesian MCMC 
approach as a way to better understand parameter uncer- 
tainties and degeneracies. 

Gregory (2005a, b & c) presented a Bayesian MCMC 
algorithm that makes use of parallel tempering to efficiently 
explore the full range of model parameter space starting 
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from a random location. It is able to identify any signif- 
icant periodic signal component in the data that satisfies 
Kepler's laws and thus functions as a Kepler periodogramQ 
This eliminates the need for a separate periodogram search 
for trial orbital periods which typically assume a sinusoidal 
model for the signal that is only correct for a circular or- 
bit. In addition, the Bayesian MCMC algorithm provides 
full marginal parameters distributions for all the orbital el- 
ements that can be determined from radial velocity data. 
The samples from the parallel chains can also be used to 
compu te the marginal likelihood for a given model (ICregorvl 
l2005al ) for use in computing the Bayes factor that is needed 
to compare models with different numbers of planets. The 
parallel tempering MCMC algorithm employed in this work 
includes a control system that automates the selection of ef- 
ficient Gaussian parameter proposal distributions. This fea- 
ture makes it practical to carry out blind searches for mul- 
tiple planets simultaneously. 

This paper outlines improvements to the parallel tem- 
pering MCMC algorithm that allow for improved mixing 



^ Fo llowing on from Bretthorst's pioneering work llBretthorstI 
Il988t) . many other Bayesian periodograms have been devel- 
oped. Several examples, based on different prior informatio n, are 
given by Brett horst (2001, 20031. iGregorv fc Loredol lll992f) , and 
iGreeorvl lll999l) . 
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and more efficient convergence. In addition several different 
methods for computing tlie model marginal likeli hood are 
compa red. We also confirm our earlier discovery (jGreeorvl 
bOOSd ) of a second planet in HD 208487. 

Some of the analysis presented in this paper was based 
on the original 31 rad i al vel ocity meas urements (old data) 
given in lTinnev et~al] (|2005h . Recentlv. [Butler et al.1 (|2006l l 
published a revised radial velocity data set based on an im- 
proved pipeline used to convert the raw spectra to radial 
velocities. This new data set also includes 4 new measure- 
ments. Most of the results pertain to the new data set, but 
some of the results dealing with model selection in Section [7] 
make use of the old data set and this is indicated in the text. 



2 PARALLEL TEMPERING 

A simple Metropolis-Hastings MCMC algorithm can run 
into difficulties if the target probability distribution is multi- 
modal with widely separated peaks. It can fail to fully ex- 
plore all peaks which contain significant probability, espe- 
cially if some of the peaks are very narrow. This is frequently 
the case with extrasolar planet data which is typically very 
sparsely sampled. The problem is somewhat similar to the 
one encountered in finding a global minimum in a nonlin- 
ear model fitting problem. One solution to finding a global 
minimum is to use simulated annealing by introducing a 
temperature parameter T which is gradually decreased. 

In parallel tempering, multiple copies of the MCMC 
simulation are run in parallel, each at a different tempera- 
ture. Mathematically, we can describe these tempering dis- 
tributions by 

7r(X|73,/3,Mi,/) = C p{X\Mi,I)p{D\Mi,X,lf 
= Cp(X|Mi,/)x 

exp(/3 \n[p{D\Nh,X,I)\), (1) 
for < /3 < 1 

where X stands for the set of model parameters. The nor- 
malization constant, C, is unimportant and will be dropped. 
Rather than use a temperature which varies from 1 to in- 
finity, we use its reciprocal, 13 = 1/T, and refer to as the 
tempering parameter. Thus (3 varies from 1 to zero. 

One of the simulations, corresponding to (3 = 1, 
is the desired target probability distribution. The other 
simulations correspond to a ladder of higher temperature 
distributions. Let equal the number of parallel MCMC 
simulations. At intervals, a pair of adjacent simulations on 
this ladder are chosen at random and a proposal made to 
swap their parameter states. A Monte Carlo acceptance 
rule determines the probability for the proposed swap to 
occur. For (3 = 1, the distribution is the desired target 
distribution which is referred to as the cold sampler. For 
(3 <^ 1, the distribution is much flatter. This swap allows 
for an exchange of information across the population of 
parallel simulations. In the higher temperature simulations, 
radically different configurations can arise, whereas in 
higher /3 (lower temperature) states, a configuration is 
given the chance to refine itself. Some experimentation 
is needed to refine suitable choices of (3 values, which 
can be assessed by examining the swap acceptance rate 
between adjacent simulations. If adjacent simulations do 



not have some overlap the swap rate between them will 
be very low. The number of (3 values required depends 
on the application. For parameter estimation purposes 
a typical value of = 12. It is important that the 
iterations from the lowest value of (3 explore the full range 
of the prior parameter space. For HD 208487, the set (3 = 
{0.05, 0.1, 0.15, 0.25, 0.35, 0.45, 0.55, 0.65, 0.70, 0.80, 0.90, 1.0} 
proved useful for parameter estimation and achieved a typ- 
ical acceptance rate between adjacent levels of ^ 50%. The 
mean number of iterations between swap proposals was set 
= 8. Final inference is based on samples drawn from the 
/3 = 1.0 simulation. 

It is possible to use the samples from hotter simulations 
to evaluate the marginal (glo bal) likelihood n eeded for model 
selection (see Section 12.7 of lGregorvll2005al ) . Marginal like- 
lihoods estimated in this way typically require many more 
parallel simulations. For HD 208487, 34 (3 levels were used 
spanning the range (3 = 10~* to 1.0. This is discussed more 
in Section [T] 

2.1 Proposal distributions 

In Metropolis-Hastings versions of MCMC, parameter pro- 
posals are drawn from a proposal distribution. In this anal- 
ysis, independent Gaussian proposal distributions, one for 
each parameter, were employed. Of course, for sparse data 
sets it can often be the case that some of the parameters are 
highly correlated resulting in inefiicient sampling from inde- 
pendent Gaussians. One solution is to use combinations of 
model parameters that are more independent. More on this 
later. In general, the ct's of these Gaussian proposal distri- 
butions are different because the parameters can be very dif- 
ferent entities. Also if the ct's are chosen to small, successive 
samples will be highly correlated and it will require many 
iterations to obtain an equilibrium set of samples. If the 
cr's are too large, then proposed samples will very ra r ely be 
accepted. Based on empirical studies, iRoberts et al.l (|l997l ) 
recommend calibrating the acceptance rate to about 25% 
for a high-dimensional model and to about 50% for mod- 
els of 1 or 2 dimensions. Although these studies were based 
on a multinormal target probability distributions, they have 
proven a useful guideline for our application as well. 

The process of choosing a set of useful proposal cr's 
when dealing with a large number of different parameters 
can be very time consuming. In parallel tempering MCMC, 
the problem is compounded because of the need for a sep- 
arate set of proposal a's for each simulation. We have au- 
tomated this process using a two stage statistical control 
system (CS) in which the error signal is proportional to the 
difference between the current acceptance rate and a target 
acceptance rate, typically 25%. In the first stage an initial 
set of proposal ct's (~ 10% of the prior range for each param- 
eter) are separately perturbed to determine an approximate 
gradient in the acceptance rate with respect to the proposal 
(j's. The cr's are then jointly modified by a small increment 
in the direction of this gradient. This is done for each of the 
parallel simulations or chains as they are sometimes called. 
When the cr's are large all the MCMC simulations explore 
the full prior distribution and locate significant probability 
peaks in the joint parameter space. As the proposal cr's de- 
crease these peaks are more efficiently explored in the (3 = 1 
simulation. This annealing of the proposal cr's typically takes 
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place over the first 5,000 to 150,000 (unthinned) iterations 
for one planet and first 5,000 to 300,000 iterations for two 
planets. This may seem like an excessive number of itera- 
tions but keep in mind that we are dealing with sparse data 
sets that can have multiple, widely separated probability 
peaks and we want the MCMC to locate the most signifi- 
cant probability peak before finalizing the choice of proposal 
cr's. 

Although the acceptance rate for the final joint set of 
parameter cr's is achieved, it often happens that a subset of 
the proposal cr's will be too small leading to excessive cor- 
relation in the MCMC iterations for these parameters. The 
second stage CS corrects for this. In general, the burn-in 
period occurs within the span of the first stage CS, i.e., the 
significant peaks in the joint parameter probability distribu- 
tion are found, and the second stage improves the choice of 
proposal cr's for the highest probability parameter set. Oc- 
casionally, a new higher probability parameter set emerges 
at a later iteration. The second phase of the control system 
can detect this and compute a new set of proposal cr's. If this 
happens the control system resets the burn-in period to in- 
clude all previous iterations. The useful MCMC simulation 
data is obtained after the CS is switched off. Although inclu- 
sion of the control system may result in a somewhat longer 
effective burn-in period, there is a huge saving in time be- 
cause it eliminates many trial runs to manually establish a 
suitable set of proposal cr's. 



3 RE-PARAMETERIZATION 

Ford (2006) examined the effect of a variety of re- 
parameterizations for the Kepler model on the MCMC con- 
vergence speed. By far the biggest improvement was ob- 
tained with re-parameterizations that involved u>, the argu- 
ment of periastron, and Mo, the mean anomaly at initial 
epoch. This is because for low eccentricity orbits u) can be 
poorly constrained but the observational data can often bet- 
ter constrain the lj + Mo . 

In our analysis the predicted radial velocity is given 



v{U) = V + K[cos{9{U + xP)+^} + ec 
and involves the 6 unknowns 



(2) 



V = a constant velocity. 

K = velocity semi-amplitude. 

P — the orbital period. 

e — the orbital eccentricity. 

ui — the longitude of periastron. 

X = the fraction of an orbit, prior to the start of data 
taking, that periastron occurred at. Thus, xP ~ the number 
of days prior to ti — that the star was at periastron, for 
an orbital period of P days. 

6{ti + xP) = the angle of the star in its orbit relative 
to periastron at time ti, also called the true anomaly. 

We utilize this form of the equation because we obtain 
the dependence of 6 on ti by solving the conservation of 
angular momentum equation 



de 2TT[l + ecose{t,+xP)f 



= 0. 



(3) 



dt P{1 - e2)3/2 

Our algorithm is implemented in Mathematica and it proves 



faster for Mathematica to solve this differential equation 
than solve the equations relating the true anomaly to the 
mean anomaly via the eccentric anomaly. Mathematica gen- 
erates an accurate interpolating function between t and 6 
so the differential equation does not need to be solved sepa- 
rately for each ti. Evaluating the interpolating function for 
each ti is very fast compared to solving the differential equa- 
tion, so the algorithm should be able to handle much larger 
samples of radial velocity data than those currently available 
without a significant increase in computational time. 

Instead of varying x and cu at each MCMC iteration, we 
varied tp — 2-KX + i^ and (j) = 2nx — i^, motivated by the work 
of Ford (2006). is well determined for all eccentricities. Al- 
though <j!> is not well determined for low eccentricities, it is 
at least orthogonal to the t/i parameter. It is easy to demon- 
strate that uniform sampling of in the interval to 2tt and 
uniform sampling of in the interval ~2n to +2n results in 
uniform coverage of x in the interval to 1 and uniform 
coverage of u from to 2n using the relations 



X ~ Modulus 
ui = Modulus 



An ' 



(4) 



Restricting (j) to the interval to 27r produces an hourglass 
coverage of half the x, ^ plane. This re-parameterization, 
together with the additional second stage CS, achieved good 
mixing of the MCMC iterations for a wide range of orbital 
eccentricities. 



4 FREQUENCY SEARCH 

For the Kepler model with sparse data, the target probabil- 
ity distribution can be very spiky. This is particularly a prob- 
lem for the orbital period parameters which span roughly 6 
decades. In general, the sharpness of the peak depends in 
part on how many periods fit within the duration of the 
data. The previous implementation of the parallel temper- 
ing algorithm employed a proposal distribution for P which 
was a Gaussian in the logarithmic of P. This resulted in a 
constant fractional period resolution instead of a fixed abso- 
lute resolution, increasing the probability of detecting nar- 
row spectral peaks at smaller values of P. However, this 
proved not to b e entirely satisfacto ry because for the HD 
73526 data set of iTinnev eta\] (120031 ) . one of the three prob- 
ability pe aks (the highest ) was not detected in two out of 
five trials (|Gregorvll2o"05bl ). 

Our latest algorithm implements the search in fre- 
quency space for the following reasons. In a Bayesian analy- 
sis, the width of a spectral peak, which reflects the accuracy 
of the frequency estimate, is determined by the duration of 
the data, the signal-to-noise (S /N) ratio and t he number 
of data points. More precisely (|Gregorvl l2005al . iBretthorsO 
[l988% for a sinusoidal signal model, the standard deviation 
of the spectral peak, Sf, for a S/N > 1, is given by 



Sf^ (1.6|T^/iv)"' 



Hz, 



(5) 



where T — the data duration in s, and A'^ — the number 
of data points in T. The thing to notice is that the width 
of any peak is independent of the frequency of the peak. 
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Thus the same frequency proposal distribution will be effi- 
cient for all frequency peaks. This is not the case for a period 
search where the width of a spectral peak is oc P^. Not only 
is the width of the peak independent of f, but the spac- 
ing of peaks is constant in frequency (roughly A/ ~ 
which is a another motivation for sear ching in frequency 
space (e.g. ■ IScarglelll98 j , ICumminell2004 ) ■ With a frequency 
search strategy, a re-analysis of the original HD 73526 data 
set resulted in all three peaks being detected in five out of 
five trials. 



5 CHOICE OF PRIORS 

In a Bayesian analysis we need to specify a suitable prior 
for each parameter. We first address the question of what 
prior to use for frequency for multi-planet models. In this 
work, the lower cutoff in period of Id is chosen to be less 
than the smallest orbital period of known planets but some- 
what larger than the Roche limit which occurs at 0.2d for 
a lOMjup planet around a IMq star. The upper period 
period cutoff of lOOOyr is much longer than any known ex- 
trasolar planet, but corresponds roughly to a period where 
perturbations from passin g stars and the galactic tide would 
disrupt the planet's orbit (|Ford fc Gregorvll2006l ) . For a sin- 
gle planet model we use a Jeffreys prior because the prior 
period (frequency) range spans almost 6 decades. A Jeffreys 
prior corresponds to a uniform probability density in In /. 
This says that the true frequency is just as likely to be in the 
bottom decade as the top. The Jeffreys prior can be written 
in two equivalent ways. 

din/ 



p(ln/lA/i,J) dln/ = 



pif\M,I) df - 



df 



f Hfa/fL 



(6) 



(7) 



What form of frequency prior should we use for a mul- 
tiple planet model? We first develop the prior to be used in 
a frequency search strategy where we constrain the frequen- 
cies in an n planet search such that (/^ ^ /i ^ /2 • • • ^ 
fn ^ /h)- From the product rule of probability theory and 
the above frequency constraints we can write 

p(ln /i , In /2 , ■ • ■ In /„ I Af„ , J) = p(ln M M„ , /) 

xp(ln I In /„ , M„ , /) ■ ■ ■ p(ln /a | In /s , M„ , 7) 
xp(ln/i|ln/2,M„,/). (8) 

For model selection purpose we need to use a normalized 
prior which translates to the requirement that 

fin iff 

p(ln /i , In /a , • ■ • In /„ I M„ , /)d In /i ■ • ■ d In /„ = 1 . (9) 

We assume that p(ln /i, In /a, • • • In /„|M„, /) is equal to a 
constant k everywhere within the prior volume. We can solve 
for k from the integral equation 

k dlnf„ dln/„_i--- / dln/i = l. (10) 

JlnfL JlnfL In /i 

The solution to equation (|10[) is 



The joint frequency prior is then 
p(ln/i,ln/2,---ln/„lA/„,/) = 



(12) 



Mfn/fL)]- 

Expressed as a prior on frequency, equation (lllf) becomes 



p(,fi,./2,---/„!A/„,/) = 



/l/2---/n H/h/Zl)]" 



(13) 



We note that a similar result, involving the factor n\ in the 
numerator, was obtained by Bretthorst (2003) in connection 
with a uniform frequency prior. 

Two different approaches to searching in the frequency 
parameters were employed in this work. In the first approach 
(a): an upper bound on /i ^ /2 (P2 ^ Pi) was utilized to 
maintain the identity of the two frequencies. In the second 
more successful approach (b): both /i and /a were allowed 
to roam over the entire frequency range and the parame- 
ters re-labeled afterwards. In this second approach nothing 
constrains /i to always be below /2 so that degenerate pa- 
rameter peaks can occur. For a two planet model there are 
twice as many peaks in the probability distribution possible 
compared with (a). For a n planet model, the number of pos- 
sible peaks is n! more than in (a). Provided the parameters 
are re- labeled after the MCMC, such that parameters as- 
sociated with the lower frequency are always identified with 
planet one and vice versa, the two cases are equivalent and 
equation (|12[) is the appropriate prior for both approaches. 

Approach (b) was found to be more successful because 
in repeated blind period searches it always converged on the 
highest posterior probability distribution peak, in spite of 
the huge period search range. Approach (a) proved to be 
unsuccessful in finding the highest peak in some trials and 
in those cases where it did find the peak it required many 
more iterations. Restricting P2 ^ Pi (/i ^ /a) introduces 
an additional hurdle that appears to slow the MCMC period 
search. 

The full set of priors used in our Bayesian calculations 
are given in Table [1] Two different limits on Ki were em- 
ployed. In the first case #1, the upper limit corresponds to 
the velocity of a planet with a mass — 0.01 Mq in a circular 
orbit with our shortest period of one day. Also the upper 
bound on Pi of 1000 yr is an upper bound based on galac- 
tic tidal disruption. Previously we used an upper limit of 
three times the duration of the data. An upper bound of 
Kjnax (^^^^Y^^ was proposed at an exoplanet workshop at 
the Statistics and Applied Math Sciences Institute (spring 
2006), however, the factor of ( ^g'" ) ^ was not incorporated 
in early runs of the current analysis. We set A'max = 2129m 
s~^ , which corresponds to a maximum planet-star mass ratio 
of 0.01. 

For case #2, the upper limit on Ki was set equal to 



K 



1/3 



based on equation (|14p . 



m sin i / 2nGMt 
M, \ P 



1/3 



m 



-2/3 



(14) 



where m is the planet mass, Mt is the star's mass, and G is 
the gravitational constant. Case #2 is an improvement over 



MfH/.fL)r- 



(11) 



To date this claim has been tested for n < 3. 
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Table 1. Prior parameter probability distributions. 



Parameter 



Prior 



Lower bound Upper bound 



Orbital period Pi 



Velocity Ki 
(m s"-"-) 



V (m s-l) 
Eccentricity 



(1) Modified Jeffreys^ 

1 1 



K+Ko 



In 1 + 



(2) 



In 

Uniform 



1.0 d 







1000 yr 



(Ko = 1) Xmax = 2129 



Longitude of periastron Wi Uniform 
Extra noise s (m s^-*-) — (^+^o) 



standard deviation 



(so = 1 & 10m s-i) 



-27r 



2n 



Since the prior lower limits for K and s include zero, we used a modified Jeffreys prior of the form 

p(X\M, I) = -J— ^ (15) 

X + Xo ln(l + %J-) 

For X <S Xq, p{X\M, I) behaves like a uniform prior and for X ^ Xq it behaves like a Jeffreys prior. The 
In ( 1 + ) term in the denominator ensures that the prior is normalized in the interval to Xmax • 



Jfmax {—p^) because it allows the upper limit on K to 
depend on the orbital eccentricity. Clearly, the only chance 
we have of detecting an orbital period of 1000 yr with current 
data sets is if the eccentricity is close to one and we are lucky 
enough to capture periastron passage. Prior ^2 was used in 
this work with the exception of Section[7]on model selection. 
In that section, some results were obtained with prior #1 
and the rest with prior #2, as indicated in the text. 

Three of the models considered in this paper. Mo (no 
planet). Mi (one planet), and M2 (two planet model), in- 
corporate an extra noise parameter, s, that can allow for 
any additional noise beyond the known measurement uncer- 
tainties[f|. We assume the noise variance is finite and adopt 
a Gaussian distribution with a variance s^. Thus, the com- 
bination of the known errors and extra noise has a Gaus- 
sian distribution with variance = af + s'^, where ai is the 
standard deviation of the known noise for i"' data point. 
For example, suppose that the star actually has two plan- 
ets, and the model assumes only one is present. In regard 
to the single planet model, the velocity variations induced 
by the unknown second planet acts like an additional un- 
known noise term. Other factors like star spots and chro- 

^ In the absence of detailed knowledge of the sampling distribu- 
tion for the extra noise, we pick a Gaussian because, for any given 
finite noise variance, it is the distribution with the largest uncer- 
tainty as me asured by the entropy, i.e., the maximum entropy 
distribution l l JavneElll957l . lGregor\ll2005al section 8.7.4.) 



mospheric activity can also contribute to this extra veloc- 
ity noise term which is often referred to as stellar jitter. 
Several researchers have attempted to estimate stellar jit- 
ter for individual stars based on statistical correlations with 
observables (e.g., [Saar & Donahue 1997, Saar et al. 199i, 
IWrightllioOsI ). In general, nature is more complicated than 
our model and known noise terms. Marginalizing s has the 
desirable effect of treating anything in the data that can't 
be explained by the model and known measurement errors 
as noise, leading to conservative estim ates of orbital par am- 
eters (see Sections 9.2.3 and 9.2.4 of iGregorvl HooHi) for 
a tutorial demonstration of this point). If there is no extra 
noise then the posterior probability distribution for s will 
peak at s = 0. The upper limit on s was set equal to /Cmax- 
For the old data set we employed a modified Jeffrey's prior 
with a knee, so — Im s~^. For the new data we carried out 
the calculations for two different choices, namely, so = 1 and 
So = 10m s~^. 

We also consider a fourth model Mi . , a one planet 
model with a Gaussian prior for the extra noise parameter 
s of the form 

p(s|A/i^.,J)^fcjexp (- ^'20-1"^' )' 

where kj is the normalization constant given by 

kj = 1/ /"""""'"exp Lil^^ys. (17) 
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Figure 1. The new data is shown in panel (a) and the best fitting 
two planet (Pi = 129.8 day, P2 = 908 day) model versus time is 
shown in (b). Panel (c) shows the residuals. 



For this model we se t Sg — 5.4m s ^, th e jitter estimate for 
HD 208487 given bv lButler et all l|2006l ), based on Wright's 
(2005) jitter modeling. We set as = 3m s~^ as an estimate 
of the uncertainty in the value of Sa- 



6 RESULTS 

As mentioned in the introduction, the initial analysis was 
carried out using the 31 radial velocity measurements from 
Tinney et al. (2005) who reported the detection of a single 
planet with M sin i = 0.45 ±0.05 in a 130 ± 1 day orbit with 
an eccentricity of 0.32±0.1. Figure[T]show s the new precision 
radial velocity data for HD 208487 from , Butler et al.l (|2006l ) 
who reported a single planet with M sin i = 0.52 ± 0.08 in a 
130.08±0.51 day orbit with an eccentricity of 0.24±0.16. The 
additional data points are the last 4 shown in the figure. The 
new pipeline resulted in changes to the values of the original 
31 data points. The largest change after allowance for the 
different means (different zero points) was 6.9 m s~^ or 1.3 



iGregorvl (|2005d ) reported results from a preliminary re- 
analysis of the old data set which indicated a second planet 
with a period of « 1000 days. Panels (b) and (c) show the 
best fitting two planet light curve and residuals based on the 
more detailed analysis of the new data which is presented in 
this paper. 

Figure [2] shows fH — 1 MCMC iterations for each of the 
parameters starting from a specific but arbitrary initial lo- 



cation in parameter space of Pi = 50 d, ei = 0.3, V = 2.0 
ms~^, ipi = 2.0 radians, Ki = 20 ms~^, (j>i — 0.0 radi- 
ans, P2 = 700 d, 62 = 0.1, V2 = 2.0 radians, K2 = 15 
ms~^, (j}2 — 0.0 radians, s = 3 ms^^. A total of 1.8 mil- 
lion iterations were used but only every sixth iteration was 
stored. For display purposes only every two hundredth point 
is plotted in the figure. The burn-in period of approximately 
40,000 iterations is clearly discernable. The dominant solu- 
tion corresponds to Pi = 129. 8d and P2 — 908d, but there 
are occasional jumps to other remote periods demonstrat- 
ing that the parallel tempering algorithm is exploring the 
full prior range in search of other peaks in the target pos- 
terior distribution. The Xi s^nd uji traces were derived from 
the corresponding tpi,<pi traces using equation Q. Figure [3] 
shows the post burn-in iterations for a window in period 
space (125 ^ Pi ^ 135d and 650 Pi ^ 1200d) that 
isolates the dominant peak. All the traces appear to have 
achieved an equilibrium distribution. There is a weak corre- 
lation between P2 and £2 and weak correlation tail evident 
between K2, 62, and V. These correlations are shown better 
in the joint marginal distributions for 6 pairs of parameters 
in Figure m Each dot is the result from one iteration. The 
lower four panels of this plot nicely illustrate the advantage 
of the using the tpi — 2TTXi + i^i and (pi = 2TTXi — i^i re- 
parameterization, which are essentially uncorrelated in com- 
parison to the Xi and uii. 

The Gelmen-Rubin statistic is typically used to test for 
convergence of the parameter distributions. In parallel tem- 
pering MCMC, new widely separated parameter values are 
passed up the line to the fH — 1 simulation and are occa- 
sionally accepted. Roughly every 100 iterations the /3 = 1 
simulation accepts a swap proposal from its neighboring sim- 
ulation. If the transition is to a location in parameter space 
that is very remote from the dominant solution, then the 
P — 1 simulation will have to wait for another swap to return 
it to the dominant peak region. One example of such a swap- 
ping operation is shown for the P2 parameter in Figure (5) Of 
course most swaps are to locations within the equilibrium 
distribution of the dominant peak. The final /3 = 1 simula- 
tion is thus an average of a very large number of independent 
/3 = 1 simulations. What we have done is divide the /3 = 1 
iterations into ten equal time intervals and inter compared 
the ten different essentially independent average distribu- 
tions for each parameter using a Gelmen-Rubin test. For 
all of the two planet model parameters the Gelmen-Rubin 
statistic was 1.02. 

Figure [6] shows the individual parameter marginal dis- 
tributions for the two planet model dominant solution. For 
comparison purposes, the marginals distributions for the one 
planet model are shown in Figure [7] Table [2] compares our 
Bayesian one planet orbital parameter values and their er- 
rors, for the two different choices of so, to the values from 
a) Tinney et al. (2005) and b) Butler et al. (2006). The pa- 
rameter values given for our analysis are the median of the 
marginal probability distribution for the parameter in ques- 
tion and the error bars identify the boundaries of the 68.3% 
credible region. The value immediately below in parenthesis 
is the MAP value, the value at the maximum of the joint pos- 
terior probability distribution. It is clear from Table [2] that 
changing so from 1 to 10m s~^ did not significantly alter the 
Ml parameter estimates. The values derived for the semi- 
major axis and M sin i, and their errors, are based on the as- 
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Figure 2. MCMC parameter iterations. Tlie upper left panel is 
a plot of the prior X likelihood, and the upper right panel shows 
a blow-up of the y-axis after dropping the first 10,000 iterations. 



sumed mass of the star = 1 .05±0.12 Mq (jValenti fc Fisched 
120051 ). iTinnov ct all (|200EJ'I assu med a mass of = 0.95 ±0.05 
M0, while Butle r at all II2006I) assumed a mass of = 1.13 
Mq but also quote IValenti fc Fischer! (|2005| ) as the refer- 
ence. 

Table|3]gives our Bayesian two planet orbital parameter 
values and their errors for the two different choices of sq. 
Apart from the s parameter, changing so from 1 to 10m s~ 
did not significantly alter the M2 parameter estimates. We 
note that the MAP value for P2 falls just outside the 68.3% 
credible region. Finally, Panel (a) of Figure|8]shows the data, 
minus the best fitting P2 orbit, for two cycles of Pi phase. 
The best fitting Pi orbit is overlaid. Panel (b) shows the 
data plotted versus P2 phase with the best fitting Pi orbit 
removed. The reduced = 1-01 for the best M2 model fit 



Figure 3. Post burn-in MCMC iterations for a window in period 
space (125 < Pi ^ 135d and 650 ^ Pi ^ 1200d) that isolates the 
dominant peak. 



when no jitter is assumed. The reduced = 2.33 for the 
best Ml model fit with no jitter, and 1.13 when a jitter of 
5.4m s~^ is assumed. 



7 MODEL SELECTION 

To compare the posterior probabilities of the two planet 
model to the one planet models we need to evaluate the 
odds ratio, O21 = p{M2\D, I)/p{Mi\D, I), the ratio of the 
posterior probability of model M2 to model Mi . Application 
of Bayes's theorem leads to, 



O21 



p{M2\I) p{D\M2,I) ^ p{M2\I) 
p{Mi\I) p(D\Mi,I) - p{Mi\I) 



B2 



(18) 
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Table 2. One planet model parameter estimates. 



Parameter Tinney et al. Butler et al. 

(2005) (2006) 



Model Mx . 



Model M-i 
(so = Im s^i) 



Model Ml 

(so = 10m s-i) 



P(d) 



130 ± 1 



130.08 ±0.51 



129.97+J; 
(129.97) 



129.98+[];4J 
(130.08) 



129.98i«;« 
(130.08) 



K (m s-l) 



20 ±2 



19.7 ±3.6 



+ 1.9 



19.3 
(19.6)" 



19.3+2 
(19.6) 



+2.1 



19.2 
(19.4)' 



0.32 ±0.10 0.24 ±0.16 



0.221': 
(0.24)' 



0.22+ J 
(0.24)' 



0.221, 
(0.23) 



u) (deg) 



126 ±40 113 



1131^ 
(116)' 



11415 
(111) 



(113) 



a (AU) 



0.49 ±0.04 0.52 ±0.03 



0.511, 
(0.51)' 



0.511 ( 
(0.51)' 



0.511 ( 
(0.51)' 



M sin i(Mj) 0.45 ± 0.05 0.52 ± 0.08 



0.481;; 
(0.49) 



0.481;; 
(0.49) 



0.481;; 
(0.49) 



Periastron 
passage 

(JD - 2,440,000) 



11002.8 ± 10 10999 ± 15 



llOOlIi: 
(10001) 



iioooii 

(10999) 



iiooii; 

(10999) 



(m s ^) 



5.9" 
(4.5) 



-1.2 
-1.4 



5.6"' 



K+l-4 
-1.5 

(4.6) 



(4.4) 



H.4 

-1.5 



RMS residuals 7.2 
(m s-i) 



8.2 



7.5 



7.5 



7.5 



Table 3. Two planet model parameter estimates. 



Parameter Model M2 (sq = Im s ^) 

planet 1 planet 2 



Model M2 (so = 10m s'^) 
planet 1 planet 2 



P(d) 

K (m s-i) 



129. 8I0 
(129.9) 



16.5ll;| 
(16.4) 



908^ 



a+81 
-94 

(1001) 



(12.8) 



129 
(129.9) 



+0.4 
-0.4 



I6.5; 
(16.4) 



-1.6 
-1.5 



909"* 



;i + 82 

-92 
(1001) 



(12.8) 



0.211, 
(0.19)' 



0.38I'; 
(0.61)" 



0.20; 

(0.19) 



-.09 
.09 



0.37: 
(0.61) 



.26 
.20 



u) (deg) 
a (AU) 
M sinj (Mj) 



123l« 
(99.7) 

0.5ll;«^ 
(0.510) 

0.4131;, 
(0.414) 



227lg 
(255) 

1.87j:;i^ 
(1.991)" 

o.45i;;^ 

(0.513) 



121 



+49 
38 



(99.5) 



0.51" 



-.02 
-.02 

(0.510) 

Q41+.05 

(0.414) 



226; 
(255) 



-52 
-52 



1.87" 



(1.991) 



.13 
.14 



0.45" 



(0.513) 



.11 

.13 



Periastron 
passage 

(JD - 2,440,000) 

s (m s"-') 

RMS residuals 
(m s~^) 



11008113 
(10995) 

{o.o)ii;^ 

4.4 



10605li 
(10512) 



11007; 



-14 



10601"* 



(10998) (10524) 

(o.o)t^;g 

4.4 
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Figure 4. Joint marginals for various pairs of parameter values. 




Figure 5. A blow-up of a small range of parameter P2 iterations 
illustrating a transition to a location that is very remote from 
the dominant solution that occasionally results from the parallel 
tempering swap operation. This allows the algorithm to widely 
explore the full prior parameter space in search of other significant 
peaks. 




128 1285 129 1295 130 130.5 131 131.5 

Pi Idl 

0.35 




Figure 6. Marginal parameter probability distributions for the 
two planet model for the new data set and so = Im s~^. 



where the first factor is the prior odds ratio, and the sec- 
ond factor is called the Bayes factor. The Bayes factor is 
the ratio of the marginal (global) likelihoods of the mod- 
els. The MCMC algorithm produces samples which are in 
proportion to the posterior probability distribution which 
is fine for parameter estimation but one needs the propor- 
tionalit y constant fo r estimating the model marginal like- 
lihood. IClvdel (|2006l ) recently reviewed the state of tech- 
niques for model select ion from a statistics perspective and 
iFord fc Gregory! (|2006l ) have evaluated the performance of 
a variety of marginal likelihood estimators in the extrasolar 
planet context. 

In this work we will compare the results from three 
marginal likelihood estimators: (a) parallel tempering, (b) 
ratio estimator, and (c) restricted Monte Carlo. The analysis 
presented in Section 17.11 17.21 and 17.31 is based on the old 
data set (Tinney et al. 2005) and prior #1. These results 
are summarized in Section [7. 41 together with model selection 



results for the new data set (Butler et al. 2006) using prior 
#2. 

7.1 Parallel tempering estimator 

The MCMC samples from all (rifs) simulations can be used 
to calculate t he marginal likeli hood of a model according to 
equation (P^lGregor^ (|2005al '). 

Hp{D\Ah,I)] = J dl3{Hp{D\Ah,X,I)]}^, (19) 

where i — 0,1,2 corresponds to a zero, one or two planet 
model, and X represent a vector of the model parameters 
which includes the extra Gaussian noise parameter s. In 
words, for each of the parallel simulations, compute the 
expectation value (average) of the natural logarithm of the 
likelihood for post burn-in MCMC samples. It is necessary 
to use a sufficient number of tempering levels that we can 
estimate the above integral by interpolating values of 
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Figure 7. Marginal parameter probability distributions for the 
one planet model for the new data set and so = Im s~^. 




Figure 8. Panel (a) shows the data, with the best fitting P2 
orbit subtracted, for two cycles of Pi phase with the best fitting 
Pi orbit overlaid. Panel (b) shows the data plotted versus P2 
phase with the best fitting Pi orbit removed. 



{\n[p{D\M,.,X,I)])0 = i Vln[p(D|Af.,X,/)]^, 

71. ^ ^ 



(20) 



in the interval from /3 = to 1, from tlie finite set. For this 
problem we used 34 tempering in the range /3 = 10~* to 1.0. 
Figure m shows a plot of {\n[p{D\Mi, X, I)\) p versus /3. The 
inset shows a blow-up of the range j3 — 0.1 to 1.0. 

The relative importance of different decades of /3 can 
be judged from Table [J] The second column gives the frac- 
tional error that would result if this decade of f3 was not 
included and thus indicates the sensitivity of the result to 




Figure 9. A plot of {hi[p{D\A4i, X, I)]) p versus /3. The inset 
shows a blow-up of the range /3 = 0.1 to 1.0. 



Table 4. Fractional error versus /3 for the results shown in Figure |9] 



/3 ran; 






Fractional error 


1.0 - 


10-1 




1.55 X 10"* 


10-1 


- 10" 


2 


7.46 X 10*^ 


10-2 


- 10- 


3 


15.2 


10-3 


- 10- 


4 


1.17 


10-* 


- 10- 


5 


0.52 


10-^ 


- 10- 


6 


0.35 


10-*^ 


- 10- 


7 


0.16 


10-7 


- 10- 


8 


0.02 



that decade. In iFord fc Greeonl (|2006l l we constructed a 
similar table for the one planet system HD 88133 only we 
investigated a wider range of /3 values down to IQ-n. The 
fractional errors for the f3 range lO"'' to 10"* were very 
similar to those given in Table ID The fractional error for 
P = 10-" - IQ-* was only 0.002 and consequently this 
range of /3 can safely be ignored. 

Figure [10] shows a plot of the parallel tempering 
marginal likelihood versus iteration number for two differ- 
ent MCMC runs for the two planet model. The solid gray 
curve shows the result for the second ratio estimator method 
which is discussed below. All three curves converge to the 
same value but the ratio estimator converges much more 
rapidly. The final parallel tempering marginal likelihood es- 
timates from the two runs were 1.5 x 10-^'* and 3.4 x 10-^'*, 
yielding an average value of 2.4 x lO"^''. 




250000 500000 750000 1x10'^ 1.25x10" 1.5x10" 
Iterations 



Figure 10. A plot of the parallel tempering marginal likelihood 
versus iteration number (dashed curves) for two different MCMC 
runs for the two planet model. The solid gray curve shows the 
result for the ratio estimator method which is discussed in Sec- 
tion [721 
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7.2 Marginal likelihood ratio estimator 

Our s econd method was introduced by iFord fc Gregory! 
lj2006h . It makes use of an additional sampling distribution 
h{X). Our starting point is Bayes' theorem 



p{X\Ah,I) = 



piX\M„I)p{D\M„X,I) 
p{D\M„I) 



(21) 



Re-arranging the terms and multiplying both sides by h{X) 
we obtain 

piD\M„I)piX\M,,I)hiX) = 

piX\Mi,I)p{D\Mi,X,I)h{X). (22) 

Integrate both sides over the prior range for X. 



p{D\M„I)re / p{X\M,,I)h{X)dX - 



p{X\M„ I)p(D\Mi,X, I)h[X)dX. 



(23) 



The ratio estimator of the marginal likelihood, which we 
designate by p[D\Mi, I)re, is given by 



piD\M,,I)r 



f p[X\M^,I)p(D\Mr,XJ)h(X)dX 
/ p{X\M,J)h{X)dX ' 



(24) 



To obtain the marginal likelihood ratio estimator, 
p{D\Mi, I)re, we approximate the numerator by drawing 
samples X'^, • • • , from h{X) and approximate the 
denominator by drawing samples X^ , X^ , ■ • ■ , X"" from the 
(3 = 1 MCMC post burn-in iterations. The arbitrary func- 
tion h{X) was set equal to a multinormal with a covariance 
matrix equal to twice the covariance matrix computed from 
a sample of the /3 = 1 MCMC output. We usedQ n'^ = 10^ 
and Us from 10* to 2 x 10^. Some of the samples from a 
multinormal h{X) can have nonphysical parameter values 
(e.g. K < 0). Rejecting all nonphysical samples corresponds 
to sampling from a truncated multinormal. The factor re- 
quired to normalize the truncated multinormal is just the 
ratio of the total number of samples from the full multinor- 
mal to the number of physical valid samples. Of course we 
need to use the same truncated multinormal in the denom- 
inator of equation (|24p so the normalization factor cancels. 
From Figures l9l and [TOl it is clear that the p{D\M2, 1)re con- 
verges much more rapidly than the parallel tempering esti- 
mator and the parallel tempering estimator, p[D\M2,I)pT, 
required 34 /3 simulations instead of one. The final values of 
p{D\Mi,I)re for models M2 and Mi were 2.0 X 10"'^'' and 
3.3 X 10"^". The solid gray curve in Figure fTUl shows marginal 
likelihood ratio estimator versus iteration number for model 
M2. It lies between the two parallel tempering convergence 
values. 



7.3 Restricted Monte Carlo marginal likelihood 
estimate 

We can also make use of Monte Carlo integration to evaluate 
the marginal likelihood as given by equation (|25[). 



Table 5. Marginal likelihoods for old data set (Tinney et al. 2005) 
and prior #1. 



Model 


so 


Marginal 


Bayes 




(m s-i) 


Likelihood 


factor 


Mo 


1.0 


1.60 X 10-''0 


(76 ± 7) X 10-'5 


Ml 


1.0 


(3.10 ±0.3) X 10"^^ 


1.0 


M2 


1.0 


(2.2 ±0.2) X 10"^'' 


(71 ± 9) 



p{D\M,,I)= / p{X\M,,I)p{D\Mi,X,I)dX. 



* According to lFord fc Greeorvl ||2006D . the numerator converges 
more rapidly than the denominator. 



(25) 



Monte Carlo (MC) integration can be very inefficient in 
exploring the whole prior parameter range, but once we 
have established the significant regions of parameter space 
with the MCMC results, this is no longer the case. The 
outer borders of the MCMC marginal parameter distribu- 
tions were used to delineate the boundaries of the volume 
of parameter space to be used in the Monte Carlo inte- 
gration. RMC integration using 10^ samples was repeated 
three times for the two planet model. The results were 
1.6 ± 2.6, 1.1 ± 1.2,2.3 ± 0.6 x lO"'^" yielding a weighted 
average of 2.1 ± 0.1 x lO"'^"'. The weighted average of RMC 
repeats for the one planet model was 2.9 ± 0.1 x 10~^^. 



7.4 Summary of model selection results 

Table [5] summarizes the marginal likelihoods and Bayes fac- 
tors comparing models Mo and M2 to Afi for the old data 
set analyzed using prior #1. For model Mo, the marginal 
likelihood was obtained by numerical integration. For Mi, 
the value and error estimate are based on the ratio esti- 
mator and RMC methods. For model M2, the two parallel 
tempering marginal likelihood estimates differed by a factor 
of ~ 2, although their average agreed within 20% with the 
values obtained by the ratio estimator and RMC methods. 
Combining the three results yielded a marginal likelihood 
good to ~ ±10%. The conclusion is that the Bayes factor 
strongly favors a two planet model compared to the other 
two models. 

Table in] gives the same information for the new data set 
analyzed using the improved prior #2 for both choices of so 
(1 and 10m s"'^). The improved data significantly strength- 
ens the case for the existence of the second planet as judged 
by the value of -621. Since the value of i32i decreases as we 
increase the value of sq, what argument can we give for not 
using a very much larger value, say so = 1000. This would 
effectively convert our Jeffreys prior into a uniform prior. 
For a uniform prior, the probability in the decade s = 100 
to 1000m s^^ is 100 times the probability in the decade from 
1 to 10m s^^. Thus, a uniform prior assumes that it is much 
more likely that s is very large than it is very small. We ar- 
gue that the scale invariant property of the Jeffreys prior is 
much more consistent with our prior state of knowledge for 
the current problem. For model selection purposes we want 
to employ the smallest value of so that avoids the divergence 
that would otherwise occur at s = 0. For the current prob- 
lem, a value of so = 10m s~^ is large enough to achieve this 
goal. 

Table |6] gives values of B21 and Boi for the two cases. 
So = 1 and 10m s~^. Table |6] also includes an entry for 
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Table 6. Marginal likelihoods for new data set (Butler et al. 
2006) and prior #2. 



Model 




Marginal 






Bayes 




(m s~-^) 


Likelihood 






factor 


Mo 


1.0 


1.56 X 10~^* 






(38 ± 4) X 10-® 


Ml 


1.0 


(4.08 ± 0.39) X 


10" 


-64 


1.0 


Ma 


1.0 


(9.54 ±0.87) X 


10- 


-62 


234 ± 31 


Mi^ 




(2.44 ± 0.24) X 


10- 


-63 




Mo 


10.0 


1.44 X 10-''** 






(58 ±6) X IQ-^ 


Ml 


10.0 


(2.47 ±0.25) X 


10- 


-64 


1.0 


M2 


10.0 


(4.38 ±0.69) X 


10- 


-62 


177 ± 33 



Ml - , the one planet plus stellar jitter model (introduced at 
the end of section [S} . It is not as simple to compare mod- 
els Mij to M2 because Mi^ makes use of additional prior 
information concerning stellar velocity noise or jitter. One 
approximate way to make use of this information for model 
M2 would be to set the prior upper bound on our extra 
noise parameter at a few times the jitter estimate, say 15m 
s-^. Reducing the prior upper bound on s from 2129m s^^ 
to 15m s-^ reduces the Occam penalty associated with the 
unknown s parameter by a factor of « 6 for so = 10m s-^. 
Thus for So — 10 m s-^, the marginal likelihood for M2 
would need to be increased by a factor of ~ 6 before com- 
paring with the likelihood for Mi^ . The corresponding Bayes 
factor is ~ 108. 

Following ICummin3 (|2004l ). we compute a Bayesian 
false alarm probability. For this purpose we will restrict the 
hypothesis space of interest to just two models Mi and M2. 
In this context, the Bayesian false alarm probability, F, is 
the probability there is really only one planet given the data 
D and our prior information /. In this case the probability 
of the data is 

p{D\I) ^ p{Ah\I)p{D\Ah,I) +p{M2\I)piD\M2,I). (26) 
Combine this with Bayes theorem 

p{Ah\D,I)=p{Ah\I)p{D\Ah,I)/p{D\I), (27) 
to obtain 

F=piAI,\D,I) = ,(M.|.).(i^|M.,.) = TTb^,' (28) 

^ p(Mi\I) p(D\Mi,I) 

where we have assumed p(Mi|7) — p{M2\I). According to 
equation (|28p . a Bayes factor B21 « 177 corresponds to a 
false alarm probability of 0.006. 



8 DISCUSSION 

One source of error in the measured velocities is jitter, which 
is due in part to flows and inhomogeneities on the stellar sur- 
face. Wright (2005) gives a model that estimates, to within 
a factor of roughly 2 (|Butler et al.|[200^ . the jitter for a star 
based upon a stars activity, colo r, TeflF, and h e ight a bove the 
main sequence. For HD 208487, iButler et all (|2006l ) quote a 
jitter estimate of 5.4m , based on Wright's model. Our 
models Ado, AIi and M2 employ instead an extra Gaussian 
noise nuisance parameter, s, with a prior upper bound of 
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Figure 11. A comparison of the marginal probability distribu- 
tions for the extra noise parameter, s, for two different values of 
the prior knee, sq for model M2. The dashed curve corresponds 
to so = 1.0m s— ^ and the solid curve to so = 10.0m s-^. 

equal to ifmax = 2129m . Anything that cannot be ex- 
plained by the model and published measurement uncer- 
tainties (which do not include jitter) contributes to the ex- 
tra noise term. Of course, if we are interested in what the 
data have to say about the size of the extra noise term then 
we can readily compute the marginal posterior for s. The 
marginal for s is shown in Figures [5] and [7] for models M2 
and All, respectively. For Afi, the marginal for s shows a 
pronounced peak with a median value o f 5.6m s"^, which is 
very close to the jitter estimate given in iButler et al] (|2006l ) 
based on Wright's model. 

For AI2 , the marginal for s shows a sharp peak at a value 
of s = Om s~^. We can determine whether the posterior 
shape was strongly influenced by our choice of knee (so = 
1.0m s-^) in the modified Jefi'reys prior for s, by comparing 
with the results obtained assuming so — lOm . The two 
marginal posteriors are shown in Figure [TT] Clearly, using a 
larger so does suppress the sharp peak at s = 0, however, 
the marginal posterior still favors a value of s close to zero. 
It would appear that the velocity variations arising from 
the second planet are on the same scale as the previously 
estimated stellar jitter for HD 208487. The results of our 
Bayesian model selection analysis indicate that a two planet 
model is greater than 100 times more probable than a one 
planet model with the previously estimated jitter. Based on 
model M2, and so = 10m s~^, the 95% upper limit on the 
remaining stellar jitter is 4.2m s-^. 

Figure [12] shows a comparison of the marginal for s for 
So = Im (dashed curve) and so = 10m (solid) for 
model All. In this case, the marginal for so = 10m s-^ 
is displaced very slightly to larger values of s, as expected 
for a more uniform prior, but the shift is negligible when 
compared to the uncertainty in the parameter value. 

It is interesting to compare the measured one a width 
of the marginal period PDF to values estimated from equa- 
tion Q after multiplying by to convert to 5f to 5P. 
We have used the ratio of to the mean velocity error 

as an estimate of the S/N for use in equation ([5]). For Pi 
the predicted SPprod = 0.31, whereas the measured value 
SPmcas ~ 0.4. Of course, the effect of any correlations be- 
tween parameters is to broaden the marginal distribution, so 
if anything we expect the measured value to be broader. In 
the case of P2, 5Pmeas ~ 1.8 X 5Pprod. In this case both the 
shape of the marginal and the P2 versus 62 correlation dia- 
gram of Figure |4] suggest that the data allow for two closely 
blended solutions in the vicinity of 900d. The marginals for 
X2 and LJ2 also show evidence for a blend of two components. 
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Figure 12. A comparison of the marginal probability distribu- 
tions for the extra noise parameter, s, for two different values of 
the prior knee, sq for model Mi. The dashed curve corresponds 
to so = 1-Om and the solid curve to so = 10.0m s~^. 

A preliminary three planet model analysis failed to detect a 
third planet. 

A rough estimate of the expected width of the marginal 
velocity PDF is equal to the mean velocity error divided by 
the \/N, or 0.9m s~^, where A'^ is the number of data points. 
In comparison the measured widths are 1.6m s~^ for Pi and 
2.5m s"^ for P2. 

In section [5] we discussed two different strategies to 
search the orbital frequency parameter space for a multi- 
planet model: (a) an upper bound on /i ^ /2 ^ ■ • ■ ^ /n is 
utilized to maintain the identity of the frequencies, and (b) 
all fi are allowed to roam over the entire frequency range and 
the parameters re-labeled afterwards. In this second case, 
nothing constrains to be less than fi so that degener- 
ate parameter peaks can occur. For case (b) and an n planet 
model, the number of possible peaks is n\ more than in (a). 
In this work we adopted approach (b) because in repeated 
blind frequency (period) searches it always converged on the 
highest posterior probability peak, in spite of the huge pe- 
riod search range. Approach (a) failed to locate the highest 
peak in some trials and in the cases where it succeeded it 
required many more iterations. 

Apart from their very different relative efficiency to lo- 
cate the highest probability peak, the two approaches are 
equivalent and equation (|12p is the appropriate frequency 
prior for both approaches. As a test of this claim, we ob- 
tained p(L>|M2, = 1.9x 10"^'' from an MCMC run using 
approach (a), the old data set, and prior #1. This compares 
closely with the value 2.0 x 10"^'' obtained for approach (b). 
Again, from an MCMC run using prior #2 and the old data 
set, we obtained p{D\M2, 1)re = 3.9 x 10"^* for (a) and a 
value oi p{D\M2,I)re = 3.9 x 10"^* for (b). Further tests 
of this claim were carried out in a entirely different problem 
involving the detection of three spectral lines. For this prob- 
lem the marginal likelihoods computed using approaches (a) 
and (b) agreed to within 1%. 

It is interesting to compare the performance of the three 
marginal likelihood estimators employed in this work to their 
performance on another data set f or HD 88133, which in - 
volved fitting a one planet model (|Ford fc Gregorvl 120061 ). 
In the latter study, 5 separate (n/3 = 32) parallel temper- 
ing runs were carried out. After 300,000 iterations, four es- 
timates agreed within 25% while the fifth was a factor of 
two larger. Based on these two applications, it would ap- 
pear that any single parallel tempering marginal likelihood 
estimate is only accurate to a factor of 2. For HD 208487 
and a two planet model, the parallel tempering estimator 



required ~ 1.5 x 10 iterations for convergence. Of course 
to establish convergence, it is necessary to execute at least 
two separate parallel tempering runs. The chief advantage of 
parallel tempering, in respect to model selection, is that it is 
able to estimate the marginal likelihood even for multimodal 
posterior distributions. For HD 88133, both the marginal 
likelihood ratio estimator and RMC estimator yielded val- 
ues within 5% of the average of the best 4 parallel tempering 
estimates. For the 2 planet model of HD 208487 and only two 
parallel tempering runs, the three estimators agreed within 
20%, with the final estimate considered accurate to ±10%. 
For more information on these and additiona l marginal like- 
lihood estimators see iFord fc Gregoni l|2006h . 



9 CONCLUSIONS 

In this paper we demonstrate the capabilities of an auto- 
mated Bayesian parallel tempering MCMC approach to the 
analysis of precision radial velocities. The method is called 
a Bayesian Kepler periodogram because it is ideally suited 
for detecting signals that are consistent with Kepler's laws. 
However, it is more than a periodogram because it also pro- 
vides full marginal posterior distributions for all the orbital 
parameters that can be extracted from radial velocity data. 
The periodogram employs an alternative method for con- 
verting the time of an observation to true anomaly that en- 
ables it to handle much larger data sets without a significant 
increase in computation time. 

A preliminary re-analysis of t he old data for HD 208487 
found evidence for a second planet (|Gregorvi2005 c) . This has 
now been confirmed by the current analysis based on the re- 
sults of the improved Kepler periodogram and the new data 
set of Butler et al. (2006). The velocity variations arising 
from the second planet are on the same scale as the previ- 
ously estimated stellar jitter for HD 208487. Based on our 
two planet model results, the 95% upper limit on stellar 
jitter is 4.2m s~^. 

We also derived the form of joint frequency prior to use 
for a multiple planet search which is given by 

p(/i,/2,---/nlM„,/) = — (29) 

/1/2 ■ ■ ■ /n[ln(/H//L)]'^ 

where n is the number of planets. There are two frequency 
search strategies: (a) constrain the frequencies such that 
f L < fi < f2 <■■■< fn < fn, 01 (b) allow each frequency 
to be unconstrained and re-label them afterwards. In prac- 
tice we found that case (b) was significantly more successful 
than (a) for blind searches. In both cases equation (|29p is 
the correct joint frequency prior. 

Considerable attention was paid to the topic of Bayesian 
model selection. Based on current research, it appears nec- 
essary to employ more than one method for estimating 
marginal likelihoods in order to obtain results that are ac- 
curate at the 10% level. 
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